feat(run): emit tool_catalog NDJSON event at session start by byapparov · Pull Request #88 · aictrl-dev/cli

byapparov · 2026-06-22T03:49:02Z

Summary

Emits a tool_catalog NDJSON event immediately after session_start and before the first model turn, listing every resolved builtin tool, MCP tool, and skill with version.
Adds MCP.toolEntries() — a lightweight sibling to MCP.tools() that returns {toolKey, serverName}[] without building full AI-SDK tool objects.
Threads version from SKILL.md frontmatter through Skill.Info (new optional field; null in the event when not declared).
New src/cli/cmd/tool-catalog.ts with injectable deps for testing; 9 TDD tests cover builtin/mcp shape, the consumer gate ("tool exposed vs missing"), skill version presence/absence, and tools/skills separation.
EVENTS.md updated with full schema documentation.

Test plan

bun test test/tool-catalog.test.ts — 9/9 pass (RED→GREEN TDD)
bun test full suite — 1294 pass, 0 fail
bun run typecheck (turbo, all packages) — clean

Skill version gap

Skill.Info.version is now threaded through from the SKILL.md frontmatter version field. Skills that don't declare a version emit "version": null. No fabricated versions — if a skill pack needs version tracking, add version: x.y.z to its frontmatter.

Closes #85

Co-Authored-By: Claude Sonnet 4.6 noreply@anthropic.com
Claude-Session: https://claude.ai/code/session_0187MsfK1upr6K2BKVbmaebQ

Emits a single `tool_catalog` event immediately after `session_start` and before the first model turn so the server-side completion gate can structurally verify that required MCP tools (e.g. `record_finding`) were actually exposed to the model — without string-matching prose output. Changes: - `src/cli/cmd/tool-catalog.ts` (new): `buildToolCatalogItems()` collects builtin tool IDs via `ToolRegistry.ids()` and MCP entries via the new `MCP.toolEntries()`, returning `{ tools, skills }` with injectable deps for unit testing. - `src/mcp/index.ts`: adds `MCP.toolEntries()` — lightweight sibling to `tools()` that returns `{ toolKey, serverName }[]` without building full AI-SDK tool objects; used by `buildToolCatalogItems()`. - `src/skill/skill.ts`: extends `Skill.Info` with optional `version` field threaded from SKILL.md frontmatter; `null` when not declared. - `src/cli/cmd/run.ts`: `await buildToolCatalogItems()` + `emit("tool_catalog", ...)` inserted between `session_start` and the first `loop()` call. - `test/tool-catalog.test.ts` (new, TDD): 9 tests covering builtin/mcp tool shape, the "tool exposed vs missing" consumer gate, skill version presence/absence, and skills/tools separation. - `EVENTS.md`: documents the new event schema including the version gap note. Skill version gap: `Skill.Info.version` is now threaded through from frontmatter. Skills without a `version` field in their SKILL.md emit `version: null`. Closes #85 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_0187MsfK1upr6K2BKVbmaebQ

…capability (#88) Instead of a one-off `version` field bolt-on, `Skill.Info` now carries a `metadata: Record<string, string>` field that retains every SKILL.md frontmatter key beyond `name`/`description`, with values string-coerced (handles YAML numbers like `version: 1.0` → `"1.0"`). `version` becomes a derived convenience accessor (`metadata.version`) rather than a parallel extraction path. The dual parse path is gone — one coherent pass builds `metadata`, then derives `version` from it. Changes: - `Skill.Info`: add `metadata: z.record(z.string(), z.string()).default({})`; keep `version: z.string().optional()` as derived accessor - Loader: replace separate `"version" in md.data` extraction with a single `Object.fromEntries(…filter name/description…map String)` pass - Tests: add "non-version frontmatter keys survive in skill.metadata" test (license+author keys survive; name/description excluded; version still resolves via metadata.version) — written RED first, then GREEN `tool_catalog` event payload unchanged — still emits `version` sourced from `metadata.version`. `metadata` not added to event payload (kept lean per spec). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_0187MsfK1upr6K2BKVbmaebQ

aictrl-dev · 2026-06-22T07:58:54Z

Code review

Verdict: Address the major findings before merging. · 🔴 0 · 🟠 2 · 🟡 3 · ⚪ 0 · 0/5 resolved

🟡 packages/cli/src/cli/cmd/run.ts:710 — Catalog built even when output is not JSON
🟠 packages/cli/src/cli/cmd/run.ts:716-718 — Catalog errors swallowed with no logging
🟠 packages/cli/src/cli/cmd/tool-catalog.ts:90 — Catalog builtins are unfiltered, not the dispatched set
🟡 packages/cli/src/mcp/index.ts:670-677 — listTools() runs twice per session start
🟡 packages/cli/src/mcp/index.ts:680-687 — MCP key derivation duplicated in toolEntries() and tools()

🤖 Fix all 5 open findings with your agent

Fix the following code review findings on aictrl-dev/cli PR #88 (head branch).
Run the relevant tests/linters after each change.

1. packages/cli/src/cli/cmd/run.ts:710 — Catalog built even when output is not JSON
   Detail: `emit()` only writes when `args.format === "json"`, but the catalog block unconditionally awaits `buildToolCatalogItems()` — which enumerates every MCP server via `listTools()` and scans the skill filesystem — even in the default human-readable mode where the result is never emitted. For interactive (non-JSON) sessions this adds unnecessary MCP round-trips and startup latency before the first turn.

Guard the collection on `args.format === "json"` (matching how the result is consumed), e.g. `if (args.format === "json") { await buildToolCatalogItems().then(...).catch(...) }`.
2. packages/cli/src/cli/cmd/run.ts:716-718 — Catalog errors swallowed with no logging
   Detail: The `.catch(() => {})` handler silently drops every error from `buildToolCatalogItems()`. The entire purpose of this event (per EVENTS.md) is to let the server-side gate detect "silent success" failure modes — but if catalog collection itself fails (e.g. `Skill.all()` throws on a malformed SKILL.md, `MCP.clients()` rejects, or `ToolRegistry.ids()` fails), the run completes with NO `tool_catalog` event at all and NO signal that anything went wrong. That is a brand-new silent-success path created by the very feature meant to eliminate them.

Because `buildToolCatalogItems` uses `Promise.all` over builtins, MCP, and skills, a single failing source (most likely `Skill.all()` hitting the filesystem) suppresses the whole catalog including the unrelated, healthy tool lists.

Please log the error (at least `log.warn`/`log.error`) inside the catch so operators can diagnose missing catalogs, while still keeping the session alive.
   Suggested fix:         .catch((err) => {
          // Never let catalog collection block or crash the session, but
          // surface the failure so a missing event is diagnosable.
          log.error("failed to emit tool_catalog", { error: err instanceof Error ? err.message : String(err) })
        })
3. packages/cli/src/cli/cmd/tool-catalog.ts:90 — Catalog builtins are unfiltered, not the dispatched set
   Detail: `buildToolCatalogItems()` resolves builtins via `ToolRegistry.ids()`, which returns the instance-level superset with NO model/agent filtering. But `resolveTools` (session/prompt.ts) dispatches via `ToolRegistry.tools({modelID, providerID}, agent, sessionID)`, which applies real per-model filters:

- `apply_patch` is only exposed for gpt-* models, and `edit`/`write` are HIDDEDEN for those same models — they are mutually exclusive at dispatch, yet the catalog will always report BOTH.
- `codesearch`/`websearch` are gated on `providerID === "aictrl" || AICTRL_ENABLE_EXA`, but the catalog always lists them.

EVENTS.md overstates the guarantee: it claims the gate "uses `tools[]` to verify that [tools] were actually in the model's function list at dispatch time." That is true for MCP tools (record_finding, etc.), but NOT for the builtin portion. A consumer that structurally checks for `edit` (or absence of `apply_patch`) on a gpt run will get wrong answers.

The author already notes "instance-level superset, without model-specific filtering" in a code comment — so the fix is either (a) pass model+agent into `buildToolCatalogItems` and call `ToolRegistry.tools(...)` so the catalog matches dispatch, or (b) correct EVENTS.md to scope the claim to MCP tools only.
4. packages/cli/src/mcp/index.ts:670-677 — listTools() runs twice per session start
   Detail: `MCP.toolEntries()` calls `client.listTools()` on every connected client at catalog time (before the first turn). Moments later, the first turn's `resolveTools` → `MCP.tools()` calls `client.listTools()` again on the same clients. For sessions with several MCP servers this doubles the round-trips to each server at session start, and for slow/remote MCP servers it adds latency directly on the critical path before the first model turn (the catalog `await` blocks `loop()`).

Consider memoizing the listTools result on the client/state for the session, or having `tools()` produce the entries so the work happens once.
5. packages/cli/src/mcp/index.ts:680-687 — MCP key derivation duplicated in toolEntries() and tools()
   Detail: `MCP.toolEntries()` re-implements the connected-client iteration and `sanitizedClientName + "_" + sanitizedToolName` key derivation that already lives in `MCP.tools()`. The catalog's value depends entirely on those keys matching what `resolveTools` exposes to the model (which comes from `MCP.tools()`'s keys). The two implementations must stay byte-for-byte in sync; if either sanitization changes, the catalog silently stops matching dispatch and the consumer gate produces false negatives/positives.

Extract a single shared helper (e.g. `mcpToolKey(clientName, toolName)` plus a shared `connectedClients()` iterator) so there is one source of truth, or have `tools()` delegate to `toolEntries()` for keying.

📋 Out-of-diff findings (5)

Sev	Location	Finding
🟡	`packages/cli/src/cli/cmd/run.ts:710`	Catalog built even when output is not JSON
🟠	`packages/cli/src/cli/cmd/run.ts:716-718`	Catalog errors swallowed with no logging
🟠	`packages/cli/src/cli/cmd/tool-catalog.ts:90`	Catalog builtins are unfiltered, not the dispatched set
🟡	`packages/cli/src/mcp/index.ts:670-677`	listTools() runs twice per session start
🟡	`packages/cli/src/mcp/index.ts:680-687`	MCP key derivation duplicated in toolEntries() and tools()

Reviewed 6 files · 0 inline · view all 5 findings ↗

aictrl · AI code review for fast-moving teams · aictrl.dev

- run.ts: guard buildToolCatalogItems on args.format==="json" so MCP listTools + skill scan are skipped for interactive sessions (#88 🟡) - run.ts: log catalog build failures in catch instead of silencing them so a missing tool_catalog event is diagnosable (#88 🟠) - mcp/index.ts: extract mcpToolKey() helper shared by tools() and toolEntries() — single source of truth for key derivation (#88 🟡) - EVENTS.md: document that builtin tools[] reflects the instance-level superset, not the per-model filtered dispatch set (#88 🟠) - test: regression tests for error propagation and key sanitization Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_0187MsfK1upr6K2BKVbmaebQ

byapparov · 2026-06-22T08:06:21Z

Review response — PR #88

All 5 findings triaged. 2 fixed directly, 1 fixed via EVENTS.md correction, 1 fixed by extracting shared key helper (🟡 duplication), 1 deferred (🟡 double-listTools — substantially mitigated by the JSON-mode guard).

Issues addressed (pushed to this PR)

🟠 Catalog errors swallowed with no logging — packages/cli/src/cli/cmd/run.ts:715-718: replaced .catch(() => {}) with .catch((err) => { console.error("failed to emit tool_catalog", ...) }). Error is now surfaced so a missing tool_catalog event is diagnosable, while still keeping the session alive. (commit 1a10f2306)
🟠 Catalog builtins are unfiltered, not the dispatched set — EVENTS.md: option (b) — corrected the documentation to explicitly state that source:"builtin" entries reflect the instance-level superset and that per-model filters (apply_patch/edit/write gating, codesearch/websearch providerID gate) are applied at dispatch time and not reflected in the catalog. The strong structural guarantee now scopes correctly to MCP tools only. (commit 1a10f2306)
🟡 Catalog built even when output is not JSON — packages/cli/src/cli/cmd/run.ts:711: wrapped the entire buildToolCatalogItems() block in if (args.format === "json"). MCP listTools() round-trips and skill filesystem scan no longer run in interactive (non-JSON) sessions. (commit 1a10f2306)
🟡 MCP key derivation duplicated in toolEntries() and tools() — packages/cli/src/mcp/index.ts: extracted mcpToolKey(clientName, toolName) private helper; both tools() and toolEntries() now call it. One source of truth — if the sanitization regex ever changes, both paths update together. (commit 1a10f2306)

Review claims verified false (no change needed)

None — all 5 findings verified TRUE.

Not addressed here

🟡 listTools() runs twice per session start — packages/cli/src/mcp/index.ts:670-677: TRUE but substantially mitigated by the JSON-format guard (fix feat: lean headless flags #3 above). toolEntries() now only runs when args.format==="json", so the double listTools() only affects json-mode sessions. Proper fix (memoizing listTools or having toolEntries derive from tools()) requires more careful coupling of tools()'s error-handling side-effects (s.status[clientName]=failed/delete s.clients[clientName]) with the catalog path. Deferring to a follow-up — the regression impact is now limited to json-mode startup latency.

aictrl-dev · 2026-06-22T09:35:16Z

Code review

Verdict: Looks good — only minor / nit comments below. · 🔴 0 · 🟠 0 · 🟡 3 · ⚪ 2 · 0/5 resolved

⚪ EVENTS.md:17 — tool_catalog section placed before session_start
🟡 packages/cli/src/cli/cmd/run.ts:711 — Awaited catalog collection can block session start
🟡 packages/cli/src/cli/cmd/run.ts:718 — Catalog failure logged to stderr, not stdout event stream
🟡 packages/cli/src/mcp/index.ts:668 — Duplicate listTools() calls double MCP round-trips
⚪ packages/cli/src/skill/skill.ts:34 — version/metadata schema fields are type-only

🤖 Fix all 5 open findings with your agent

Fix the following code review findings on aictrl-dev/cli PR #88 (head branch).
Run the relevant tests/linters after each change.

1. EVENTS.md:17 — tool_catalog section placed before session_start
   Detail: The `tool_catalog` section is placed before `session_start` in EVENTS.md, but the event is emitted *after* `session_start` (the PR title and run.ts confirm this ordering). Documenting events in emission order makes the lifecycle narrative easier to follow. Consider moving the `tool_catalog` section to after `session_start`.
2. packages/cli/src/cli/cmd/run.ts:711 — Awaited catalog collection can block session start
   Detail: `buildToolCatalogItems()` is `await`ed before `loop()` starts. Inside, `MCP.toolEntries()` wraps each `client.listTools()` in `.catch()` for rejections, but a client that *hangs* (never resolves or rejects) will leave `Promise.all` pending indefinitely, stalling the entire session before the first turn. The `tools()` path has the same `listTools()` call but runs inside the agent loop where transport-level timeouts or loop guards may apply. Consider wrapping `toolEntries()` (or the `buildToolCatalogItems()` call in run.ts) with a bounded timeout so catalog collection cannot indefinitely block session start.
3. packages/cli/src/cli/cmd/run.ts:718 — Catalog failure logged to stderr, not stdout event stream
   Detail: When `buildToolCatalogItems()` rejects, the error is written to `console.error` (stderr). JSON consumers parse the NDJSON event stream on stdout and will not see this. The event's stated purpose is to detect "silent success" failure modes, but a catalog collection failure itself produces no structured signal on stdout — just a missing `tool_catalog` event plus an unstructured stderr line. Consider emitting a structured error/warning event (e.g. `tool_catalog` with an `error` field, or a dedicated `tool_catalog_error` event) so consumers reading stdout can distinguish "catalog failed" from "catalog succeeded with no MCP tools".
4. packages/cli/src/mcp/index.ts:668 — Duplicate listTools() calls double MCP round-trips
   Detail: `MCP.toolEntries()` calls `client.listTools()` on every connected client to build the catalog, and `MCP.tools()` later calls `client.listTools()` again on the same clients during `resolveTools` inside the agent loop. For JSON runs this means two full rounds of `listTools()` per MCP server before the first model turn. Consider memoising the `listTools()` result across both call sites, or having `toolEntries()` share state with the dispatch path.
5. packages/cli/src/skill/skill.ts:34 — version/metadata schema fields are type-only
   Detail: The `Info` schema adds `version: z.string().optional()` and `metadata: z.record(...)`, but the actual parse call uses `Info.pick({ name: true, description: true }).safeParse(md.data)`, so these fields are never validated by Zod — they exist only for the TypeScript type. The `version` field is particularly misleading: a contributor reading the schema might assume `version: 1` (YAML number) would be rejected, but it is silently accepted because `pick()` excludes it from validation. Consider adding a comment noting these are type-only, or validating them via a separate pick/parse if runtime validation is intended.

📋 Out-of-diff findings (5)

Sev	Location	Finding
⚪	`EVENTS.md:17`	tool_catalog section placed before session_start
🟡	`packages/cli/src/cli/cmd/run.ts:711`	Awaited catalog collection can block session start
🟡	`packages/cli/src/cli/cmd/run.ts:718`	Catalog failure logged to stderr, not stdout event stream
🟡	`packages/cli/src/mcp/index.ts:668`	Duplicate listTools() calls double MCP round-trips
⚪	`packages/cli/src/skill/skill.ts:34`	version/metadata schema fields are type-only

Reviewed 6 files · 0 inline · view all 5 findings ↗

aictrl · AI code review for fast-moving teams · aictrl.dev

- EVENTS.md: reorder tool_catalog section after session_start to match actual emission order (bot finding: section placed before session_start) - run.ts: wrap buildToolCatalogItems() with 10s timeout so a hanging MCP server cannot stall session start indefinitely; on failure emit a structured tool_catalog_error event to stdout instead of console.error to stderr, so JSON consumers can distinguish failure from empty catalog - skill.ts: add comment clarifying that metadata/version are type-only fields not validated by the Info.pick() Zod parse — both are populated via manual Object.fromEntries coercion after parse succeeds - test: add 3 regression tests for timeout, structured error path, and catalog rejection propagation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_0187MsfK1upr6K2BKVbmaebQ

byapparov · 2026-06-22T09:46:05Z

Round 2 code review triage — PR #88 (tool_catalog session_start event)

Verdict: 4 FIX · 1 DEFER · 0 IGNORE

Fixes committed in 8dae0c934 on feat/tool-catalog-session-start.

Finding-by-finding

#	Location	Sev	Verdict	Evidence
1	`EVENTS.md:17`	⚪	FIX	`tool_catalog` section was above `session_start` in the doc; run.ts emits session_start first (line 701), then tool_catalog (line 715). Reordered to match emission order.
2	`run.ts:711`	🟡	FIX	TRUE. `toolEntries()` uses `client.listTools().catch()` which handles rejection but not hangs. `buildToolCatalogItems()` is awaited before `loop()`, so a hung MCP client stalls the session indefinitely. Fixed: wrapped with `withTimeout(..., 10_000)`. Regression test added.
3	`run.ts:718`	🟡	FIX	TRUE. On catalog failure the previous code did `console.error` to stderr; JSON consumers reading stdout see only a missing `tool_catalog` event — indistinguishable from "no tools". Fixed: catch now emits structured `tool_catalog_error` event to stdout with `{ error: string }`. Regression test added.
4	`mcp/index.ts:668`	🟡	DEFER	TRUE that `toolEntries()` and `tools()` both call `client.listTools()`. However memoising across these two call sites (pre-loop vs per-turn inside `resolveTools`) requires shared session state and is a larger refactor with concurrency implications. The blocking hazard (finding 2) is fixed by the timeout. Deferring dedup to a separate issue.
5	`skill.ts:34`	⚪	FIX	TRUE. `Info.pick({ name, description })` excludes `metadata`/`version` from Zod parse; both are populated via manual `Object.fromEntries` coercion. A contributor reading the schema could mistakenly rely on Zod validation. Added explicit doc comment: "TYPE-ONLY — not validated by Info.pick() parse".

Test results

bun test test/tool-catalog.test.ts
14 pass, 0 fail  (was 11; +3 new regression tests)

bun run typecheck — clean (turbo: all 6 tasks successful)

byapparov self-assigned this Jun 22, 2026

byapparov added this to the Enterprise Observability milestone Jun 22, 2026

aictrl-dev Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread packages/cli/src/cli/cmd/run.ts Outdated

Comment thread packages/cli/src/cli/cmd/tool-catalog.ts

Comment thread packages/cli/src/cli/cmd/run.ts

Comment thread packages/cli/src/mcp/index.ts

Comment thread packages/cli/src/mcp/index.ts

aictrl-dev Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread packages/cli/src/cli/cmd/run.ts

Comment thread packages/cli/src/cli/cmd/run.ts

Comment thread packages/cli/src/mcp/index.ts

Comment thread EVENTS.md

Comment thread packages/cli/src/skill/skill.ts

byapparov merged commit ab2401d into main Jun 22, 2026
5 checks passed

byapparov deleted the feat/tool-catalog-session-start branch June 22, 2026 11:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(run): emit tool_catalog NDJSON event at session start#88

feat(run): emit tool_catalog NDJSON event at session start#88
byapparov merged 4 commits into
mainfrom
feat/tool-catalog-session-start

byapparov commented Jun 22, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aictrl-dev Bot commented Jun 22, 2026

Uh oh!

byapparov commented Jun 22, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aictrl-dev Bot commented Jun 22, 2026

Uh oh!

byapparov commented Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

byapparov commented Jun 22, 2026

Summary

Test plan

Skill version gap

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aictrl-dev Bot commented Jun 22, 2026

Code review

Uh oh!

byapparov commented Jun 22, 2026

Review response — PR #88

Issues addressed (pushed to this PR)

Review claims verified false (no change needed)

Not addressed here

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aictrl-dev Bot commented Jun 22, 2026

Code review

Uh oh!

byapparov commented Jun 22, 2026

Round 2 code review triage — PR #88 (tool_catalog session_start event)

Finding-by-finding

Test results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant